Semantic Annotation Layer in Russian National Corpus: Lexical Classes of Nouns and Adjectives
نویسندگان
چکیده
The paper describes the project held within Russian National Corpus (http://www.ruscorpora.ru). Beside such obligatory constituents of a linguistic corpus as POS (parts of speech) and morphological tagging RNC contains semantic annotation. Six classifications are involved in the tagging: category, taxonomy, mereology, topology, evaluation and derivational classes. The operating of the context semantic rules is shown by applying them to various polysemous nouns and adjectives. Our results demonstrate semantic tags incorporated in the context to be highly effective for WSD.
منابع مشابه
Identification of context markers for Russian nouns
The research project presented in this paper aims at identification of context markers for Russian nouns and their use in construction identification. The body of contexts has been extracted from the Russian National Corpus (RNC). The context processing procedure takes into account the lexical and semantic information represented in the corpus annotation. Merged meaning of words are taken into ...
متن کاملUsing Semantic Annotations to Cluster Lexical Relationships
This report describes work that builds on the Semantic Annotation project from the 2003 Johns Hopkins University summer workshop 2003. This study investigates the automatic derivation of preferences for both adjectives and verbs from the 26 million word semantically annotated corpus produced as part of the workshop. This corpus was enhanced by identifying additional named entities in the text. ...
متن کاملOn the Role of Derivational Processes in the Formation of Non-Taxonomic Classes of Lexical Units in Russian
The paper is focused on classes of lexical units which arise as a result of derivational processes – word formation and semantic transfers, acting either in isolation or together, on the basis of common semantic foundations that bind targets and sources of derivation. The lexical items which constitute the classes under study vary in their denotative characteristics and due to their categ...
متن کاملFrameBank: A Database of Russian Lexical Constructions
Russian FrameBank is a bank of annotated samples from the Russian National Corpus which documents the use of lexical constructions (e.g. argument constructions of verbs and nouns). FrameBank belongs to FrameNetoriented resources, but unlike Berkeley FrameNet it focuses more on the morphosyntactic and semantic features of individual lexemes rather than the generalized frames, following the theor...
متن کاملSemantic Annotation of Verbs for the Tatar Corpus
This paper discusses the problem of developing the metalanguage for linguistic applications and introduces a tag set for the semantic annotation of verbs for the Tatar National Corpus. At present, there are no generally accepted standards for the development of corpus semantic annotation. In many cases, it is made by individual researchers or teams for one or another research project, and chara...
متن کامل